Skip to content

Fix search_graph query= multi-minute latency: two-step FTS5 subquery#302

Closed
awconstable wants to merge 1 commit intoDeusData:mainfrom
arbor-education:fix/301-bm25-fts5-early-termination
Closed

Fix search_graph query= multi-minute latency: two-step FTS5 subquery#302
awconstable wants to merge 1 commit intoDeusData:mainfrom
arbor-education:fix/301-bm25-fts5-early-termination

Conversation

@awconstable
Copy link
Copy Markdown
Contributor

@awconstable awconstable commented Apr 30, 2026

Fixes #301

Root cause

search_graph with a query= argument uses SQLite FTS5 for BM25-ranked full-text search. The previous flat query:

SELECT ... FROM nodes_fts
JOIN nodes n ON n.id = nodes_fts.rowid
WHERE nodes_fts MATCH ?
  AND n.project = ?
  AND n.label NOT IN ('File','Folder',...)
ORDER BY bm25(nodes_fts) LIMIT 20

blocks FTS5's WAND/MaxScore early-exit optimisation. FTS5 can short-circuit ORDER BY bm25() LIMIT N only when it drives the entire query plan. The outer JOIN + WHERE n.project = ? predicate is invisible to the FTS5 planner — it must score every matching document before the outer filter can discard any of them. On a large codebase with 100K+ matches this causes 2–16 minute queries.

The same problem applied to the count query, making each search_graph call pay the full scan cost twice.

Changes

Two-step subquery (bm25_search in src/mcp/mcp.c)

The inner FTS5-only subquery has no outer predicates, so SQLite CAN early-terminate it:

SELECT ...
FROM (
    SELECT rowid, bm25(nodes_fts) AS base_rank
    FROM nodes_fts WHERE nodes_fts MATCH ?1
    ORDER BY base_rank LIMIT 2000          -- FTS5 early-terminates here
) fts
JOIN nodes n ON n.id = fts.rowid
WHERE n.project = ?2
  AND n.label NOT IN ('File','Folder',...)
ORDER BY rank LIMIT ?3 OFFSET ?4

The count query uses the same inner-limit subquery structure.

Trade-off: total in the response is now capped at BM25_INNER_LIMIT (2000) — it reflects how many of the top 2000 BM25 candidates passed the project/label filters, not the full matching node count. For a code search tool, getting the top 20 most relevant results in 500ms is far more useful than an exact count after 16 minutes.

Benchmark

Tested on a large codebase (~200K nodes, ~500MB database):

Query Before After Speedup
query=approve apps authorization school 18 023ms 569ms 32×
query=Group User Details Manage All Users 120 036ms 508ms 236×
query=dev portal approve integration third party 1 015 180ms 1 009ms 1006×

The ~500ms floor is cold-start I/O when spawning a fresh process against a ~500MB database. In the long-running MCP server (warm file cache) BM25 queries return in sub-millisecond time.

Tests

All store search tests pass. The MCP test suite has a pre-existing stack buffer overflow in build_project_list_error (unrelated to this change) that kills the test runner before MCP-layer tests run; the store-layer tests all complete cleanly.

Flat BM25 queries of the form:
  SELECT ... FROM nodes_fts JOIN nodes WHERE MATCH ? AND project=? ORDER BY bm25() LIMIT N
block FTS5 WAND/MaxScore early-exit — the outer JOIN+WHERE is invisible to
the FTS5 planner, so it scores every matching document before any filter fires.
On a large codebase with 100K+ matches this causes 2–16 minute queries.

Fix: two-step subquery.  The inner FTS5-only query:
  SELECT rowid, bm25(nodes_fts) FROM nodes_fts WHERE MATCH ? ORDER BY bm25() LIMIT 2000
can early-terminate because no outer predicate blocks it.  The outer query
then joins and filters at most BM25_INNER_LIMIT (2000) candidates.

The count query uses the identical inner-limit subquery, so it benefits too.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@DeusData DeusData added bug Something isn't working stability/performance Server crashes, OOM, hangs, high CPU/memory labels May 4, 2026
@DeusData
Copy link
Copy Markdown
Owner

Closing as superseded by #300 — the FTS5 two-step subquery fix you put on this branch was bundled into #300's branch and merged together. Both cdf6e6c (PR #300's HEAD on main as 5f19454) and the name_pattern fixes are now on main. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working stability/performance Server crashes, OOM, hangs, high CPU/memory

Projects

None yet

Development

Successfully merging this pull request may close these issues.

search_graph query= takes minutes on large codebases — FTS5 early termination blocked by outer JOIN

2 participants